Main
Anderson Banihirwe
I contribute to and maintain several libraries within the open source scientific Python stack, particularly around improving scalability of Python tools in order to handle terabyte-scale datasets on HPC and cloud platforms.
Education
B.S., Computer Systems Engineering
University of Arkansas at Little Rock
Little Rock, AR
2018 - 2014
Professional Experience
Software Engineer ||
National Center for Atmospheric Research
Boulder, CO
present - 2020-10
- Created jupyter-forward, a Jupyter Lab port forwarding utility that simplifies running jupyter on remote resources.
- Served as a core developer of xarray, an open source library for working with multidimensional labeled datasets and arrays in Python.
Software Engineer |
National Center for Atmospheric Research
Boulder, CO
2020-9 - 2018-10
- Lead the intake-ESM project, a Python data cataloguing package for exploring and ingesting earth system model data sets.
- Contributed to the core software stack powering the Pangeo Project. Some of the projects I contributed to include: xarray, dask.
- Assisted with the development and deployment of live (virtual or in-person) and online/self-paced education material.
Software Developer Intern
Quansight
Austin, TX
2018-09 - 2018-05
- Developed xndframes1, a Pandas ExtensionDtype/Array backed by xnd2, a container type that maps most Python values relevant for scientific computing directly to typed memory.
- Worked on integrating cuDF3 - GPU dataframe library with Apache Arrow4 library.
Data Science Intern
First Orion
Little Rock, AR
2018-04 - 2017-11
- Built scoring, predictive models with Scikit-learn, Dask, and Apache Spark using First Orion’s proprietary telecommunication data.
Research Intern
National Center for Atmospheric Research
Boulder, CO
2017-08 - 2017-05
- Developed spark-xarray5, a Python package that integrates PySpark and xarray for climate data analysis.
Selected Publications, Posters, and Talks
Cloud-Native Repositories for Big Scientific Data6
Computing in Science and Engineering
N/A
2020-11
- Authored with Ryan Abernathey, Tom Augspurger, et al.
Pangeo Benchmarking Analysis: Object Storage vs. POSIX File System7
Fifth International Parallel Data Systems Workshop @ SC 20
N/A
2020-10
- Authored with Haiying Xu, Kevin Paul
The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC8
2019 Supercomputing Conference Workshop on Interactive High-Performance Computing
N/A
2020-01
- Authored with Tina Erica Odaka, Guillaume Eynard-Bontemps, Aurelien Ponte, Guillaume Maze, Kevin Paul, Jared Baker, Ryan Abernathey.
Zarr: chunked, compressed, multidimensional arrays9
2020 Cloud Native Geospatial Outreach Day
Online
2020-09
- Invited talk about Zarr10, an open source data format for the storage of chunked, compressed, multidimensional arrays.
Intake-ESM – Making It Easier To Consume Climate and Weather Data11
2020 ESIP Summer Meeting
Online
2020-07
- Invited talk about intake-esm, an intake plugin for working with Earth System Model (ESM) datasets.
Interactive Supercomputing with Dask and Jupyter13
2019 Scientific Computing with Python conference
Austin, TX
2019-07
- Contributed talk about Dask and Jupyter.
Beyond Matplotlib - Tutorial: Building Interactive Climate Data Visualizations with Bokeh and Friends14
2018 UCAR Software Engineering Assembly conference
Boulder, CO
2018-04
- Contributed tutorial about interactive visualization with Python.
PySpark for “Big” Atmospheric Data Analysis
Eighth Symposium on Advances in Modeling and Analysis Using Python
Austin, TX
2018-01
- Contributed talk about spark-xarray15.
Links
- https://github.com/xnd-project/xndframes
- https://github.com/xnd-project
- https://github.com/rapidsai/cudf
- https://arrow.apache.org/
- https://ncar.github.io/PySpark4Climate/
- https://www.authorea.com/doi/full/10.22541/au.160443768.88917719
- https://doi.org/10.31223/X5ZW2T
- https://doi.org/10.1007/978-3-030-44728-1_12
- https://talks.andersonbanihirwe.dev/zarr-cloud-native-geospatial-2020.html
- https://github.com/zarr-developers
- https://talks.andersonbanihirwe.dev/intake-esm-esip-2020.html
- https://www.researchgate.net/profile/Mariofanna_Milanova/publication/333414231_Perceptual_Judgments_to_Detect_Computer_Generated_Forged_Faces_in_Social_Media/links/5e2c963092851c3aaddac2f5/Perceptual-Judgments-to-Detect-Computer-Generated-Forged-Faces-in-Social-Media.pdf
- https://youtu.be/vhawO8fgD64
- https://sea.ucar.edu/event/beyond-matplotlib-building-interactive-climate-data-visualizations-bokeh-and-friends
- https://ncar.github.io/PySpark4Climate/sparkxarray/overview/